Building a Tree-Bank of Modern Hebrew Text
نویسندگان
چکیده
This paper describes the process of building the first tree-bank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parser and a small manually annotated tree-bank was explored. An initial tree-bank that consists of 500 annotated sentences from a daily newspaper is described. The annotation scheme that underlies the tree-bank analyses integrates morphology and syntax. An existing morphological analyzer and a language-independent probabilistic parser were applied to this tree-bank. Based on the results of some experiments with these tools, a semi-automatic procedure for future enlargement of the tree-bank is outlined. RSUM. Cet article décrit les différentes étapes dans la construction d’un corpus arboré de l’Hébreu moderne. L’objectif premier vise à la réduction du coût des annotations faites à la main à l’aide de moyens automatiques. À cette fin, nous montrons l’utilité de combiner un analyseur morphologique, un analyseur probabiliste et un corpus de référence de taille réduite manuellement annoté. Le corpus initial arboré consiste en 500 phrases annotées à la main extraites d’un quotidien. Le schéma d’annotation intègre des informations morphologiques et syntaxiques. Un analyseur morphologique et un analyseur syntaxique probabiliste ont et́é appliquées à ce corpus arboré. En fonction des résultats de quelques expérimentations avec ces outils, une procédure semi-automatique est mise au point pour annoter de nouveaux textes.
منابع مشابه
BUILDING A HEBREW TREE-BANK Building a Tree-Bank of Modern Hebrew Text
This paper describes the process of building the first tree-bank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parser and a small manually annotated tree-bank was explored. An initial tree-bank that consists of 50...
متن کاملA Hebrew Tree Bank Based on Cantillation Marks
In the Masoretic text of the Hebrew Bible (HB), the cantillation marks function like a punctuation system that shows the division and subdivision of each verse, forming a tree structure which is similar to the prosodic tree in modern linguistics. However, in the Masoretic text, the structure is hidden in a complicated set of diacritic symbols and the rich information is accessible only to a few...
متن کاملA Comprehensive NLP System for Modern Standard Arabic and Modern Hebrew
This paper presents a comprehensive NLP system by Melingo that has been recently developed for Arabic, based on Morfix an operational formerly developed highly successful comprehensive Hebrew NLP system. The system discussed includes modules for morphological analysis, context sensitive lemmatization, vocalization, text-to-phoneme conversion, and syntactic-analysis-based prosody (intonation) ...
متن کاملFrom Prosodic Trees to Syntactic Trees
This paper describes an ongoing effort to parse the Hebrew Bible. The parser consults the bracketing information extracted from the cantillation marks of the Masoetic text. We first constructed a cantillation treebank which encodes the prosodic structures of the text. It was found that many of the prosodic boundaries in the cantillation trees correspond, directly or indirectly, to the phrase bo...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001